A "partitioned leaping" approach for multiscale modeling of chemical reaction 

dynamics 

Leonard A. Harri^ and Paulette Clanc}|l| 
School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY 14853, USA 

(Dated: February 2, 2008) 

We present a novel multiscale simulation approach for modeling stochasticity in chemical reaction 
networks. The approach seamlessly integrates exact-stochastic and "leaping" methodologies into 
a single partitioned leaping algorithmic framework. The technique correctly accounts for stochas- 
tic noise at significantly reduced computational cost, requires the definition of only three model- 
independent parameters and is particularly well-suited for simulating systems containing widely 
disparate species populations. We present the theoretical foundations of partitioned leaping, dis- 
cuss various options for its practical implementation and demonstrate the utility of the method via 
illustrative examples. 



I. INTRODUCTION 

Stochastic simulations of chemical reaction networks 
have become increasingly popular recently, largely due to 
the observation that stochastic noise plays a critical role 
in biological functioujii^i^iiii^i^i^ The issue is relevant in 
other scientific fields as well, such as microelectronics pro- 
cessing, where statistical variations in dopant profiles can 
profoundly affect the performance of nanoscale semicon- 
ductor devices i^iiSiiii Gillespie's stochastic simulation al- 
gorithm (SSA)fi^ in particular, has found widespread use 
in computational biology. The method is a kinetic Monte 
Carlo approach that produces time-evolution trajectories 
correctly accounting for the inherent stochasticity asso- 
ciated with molecular interactions. Detailed accuracy is 
achieved by explicitly simulating every reaction occur- 
rence within a system. The method is computationally 
expensive as a result, however, and practical application 
is limited to only very small systems. 

Motivated by this fact, considerable effort has been 
undertaken recently to develop accelerated simulation 
methods capable of correctly accounting for stochastic 
noise but at significantly reduced computational cost. 
Broadly speaking, these endeavors can be divided into 
three categories: (i) algorithmic advances to increase the 
efficiency of exact-stochastic methods ; ^'^'^"^d^ (ii) "leap- 
ing" techniques in which efficiency is enhanced by ig- 
noring the exact moments at which reaction events oc- 
cur^OiisOS^^^^^ and (iii) "partitioned" methods 
in which sets of reactions are divided into various classifi- 
cations, such as "fast" and "slow," and treated either by 
applying appropriate numerical techniques to each sub- 
set ("methodology coupling" or "hybrid" 
or by reducing the model to incorporate the effects of 
the fast reactions into the dynamics of the slow ( "model 
reduction" 

In this article, we present a new simulation method 
for modeling chemical reaction dynamics that inte- 
grates exact-stochastic, leaping and methodology cou- 
pling methods into a single multiscale algorithmic frame- 
work. The fundamental theory underlying Gillespie's r- 
leap approach-'"- ' — is used to formulate a theoretically 



justifiable partitioning scheme. The partitioning is based 
on the number of reaction firings expected within a cal- 
culated time interval and partitioning and time step de- 
termination are thus inextricably linked. Based on the 
value of the time step each reaction is then "classified" 
at one of various levels of description. Species popu- 
lations are updated for coarsely classified reactions us- 
ing the leaping formulas introduced by Gillespie, while 
rare events are treated via the methods of Gibson and 
Bruck's exact-stochastic Next Reaction method^i^, Over- 
all, the technique efficiently simulates systems containing 
widely disparate species populations using rigorously de- 
rived classification criteria and requiring minimal user 
intervention. 

We begin in Sec.|TT]by reviewing the theoretical founda- 
tions of exact-stochastic simulation and r-leaping. This 
provides a basis for discussing partitioned leaping in 
Sec. mil as well as various options for its practical im- 
plementation. In Sec. IIVI we present two illustrative ex- 
amples demonstrating the utility of the method: one in- 
spired by biology and a clustering example relevant to 
materials and atmospheric sciences. Finally, in Sec. |V] 
we summarize the attributes of the method, discuss pos- 
sible modifications, and draw conclusions regarding its 
place among the many alternative techniques that have 
been proposed. 



II. BACKGROUND 

As is customary, we consider a well-mixed system of 
N molecular species {Si, . . . , Sn} interacting via M re- 
action channels {Ri, ■ ■ ■ , Rm} in a volume fl at constant 
temperature. The state of the system is described by 
the vector X(f), where Xi{t) represents the population 
of species Si at time t. Each reaction channel i?^ has 
associated with it a propensity function and a state- 
change (or stoichiometric) vector ~ i^ni^ ■ • ■ i z^n)/^ 
The propensity is defined such that a^^dt gives the 
probability that reaction i?^ will fire once during the in- 
finitesimal time interval dt. In other words, is the 
stochastic analog to the deterministic reaction rate. As 
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such, can be written as the product of a stochastic 
rate constant (which is related to the deterministic 
rate constant via a simple scaling by the system vol- 
ume) and a combinatorial factor /i^, which represents 
the number of distinct ways in which a realization of i?^ 
can occur. In general, /i^ is a simple function of the 
reactant species populations for 

A. Exact stochastic simulation 



is used, where the unprimed and primed quantities sig- 
nify new and old values, respectively 4^ 

It should be noted that a recent timing analysia^^ has 
shown that, while the NRM is certainly more efficient 
than the FRM, an optimized version of the DM actually 
performs best in most situations. For our purposes, how- 
ever, this fact is not important. The ideas underlying the 
FRM are what we will use in the development of our new 
simulation approach, and the increased efficiency offered 
by the NRM will be utilized in its implementation. 



Given the definitions above, Gillespie has developed 
a simulation methodology for modeling chemical reac- 
tion dynamics that "exactly" accounts for the stochastic 
nature of the process lii^ The stochastic simulation algo- 
rithm, or SSA, is exact in the sense that it produces pos- 
sible time-evolution trajectories that are consistent with 
the underlying chemical Master Equation governing the 
physical processJ^ 

The SSA operates by generating, at each simulation 
step, random samples of reaction times (t) and types (/i) 
and advancing the system forward in time accordingly. 
Two alternative, yet equivalent, strategies exist for doing 
so. The first, dubbed the Direct method (DM), generates 
T according to^^ 

T = -ln(ri)/ao, (1) 
while n is the integer satisfying the relationship 

^ < aor2 < ^fli., (2) 

where oq = X^i/ ^'^'i '"i ^^'^ ^2 are unit-uniform ran- 
dom numbers between and 1. 

The second approach considers each reaction in the 
system on an individual basis and asks the question, 
"When will reaction next fire assuming that no other 
reactions fire first?" Values of such "tentative" next- 
reaction times, T^^,^*' are then generated for each reac- 
tion. T is set equal to the smallest of these and /i to the 
corresponding reaction since this is the only one for which 
the assumption that no other reactions fire first actually 
holds. This approach was originally developed by Gille- 
spie^^ and dubbed the First Reaction method (FRM). 
Tentative next-reaction times are calculated via 

rf--ln(r^)K, (3) 

where is a unit-uniform random number. 

This approach has recently been modified by Gibson 
and Bruck.^'^ The Next Reaction method (NRM) is more 
computationally efficient in that a rigorous transforma- 
tion formula is employed that significantly reduces the 
number of random number generations required during 
the course of a simulation. Once a reaction has fired, a 
new value of r^® is generated for that reaction only using 
Eq. ([3]). For all other reactions 

= KK)«^'-r'), (4) 



B. T- leaping 

Despite the improved efficiency offered by the NRM— 
and the optimized version of the DM,^"* the SSA remains 
limited as to the system size amenable to treatment due 
to its "one reaction at a time" nature. As a result, Gille- 
spie has recently attempted to move beyond the "exact" 
approach by introducing approximations regarding the 
number of times a reaction can be expected to fire within 
a given time interval. His approach, known as r-leaping, 
begins by defining the quantity K^{t) as the number of 
times reaction channel R^ fires during the time interval 
[i, t4-T)ji^i22 In general, K^(t) is a complex random vari- 
able dependant upon the propensity and the manner 
in which it changes during [i, t + r). Obtaining a rigorous 
expression for the probability function governing (r) is 
thus tantamount to solving the usually intractable chem- 
ical Master Equation. 

Gillespie recognized, however, that if a time period 
exists over which the propensity remains essentially 
constant then one can approximate K^{t) as a Poisson 
random variable r^'^ 

K^{T)^V^{a,,T). (5) 

There always exists a value of r over which this assump- 
tion holds; in the extreme case it is the interval between 
successive reactions. In many cases, however, the inter- 
val is likely to span numerous reaction events, especially 
when the reactant populations are "large."— Gillespie 
then noted that if the mean value of 7'p(a^,r), i.e., a^r, 
is "large," then one can approximate the Poisson random 
variable as a normal random variable ; ^^'^^ 

K^ir) w J\f^{a^T,a^j^T) (6) 
= a^T + ^a~r X 7V(0, 1), 

where the second equality follows from the linear combi- 
nation theorem for normal random variables. ^^'"^^ Equa- 
tion ([6]) is essentially a chemical Langevin equation and 
amounts to a "continuous-stochastic" representation of 
the reaction dynamics. Finally, Gillespie showed that if 
the ratio of the "deterministic" term in a^r, to the 
"noise" term, .^a^r, is "large," then the noise term can 
be neglected, leavingi^*^ 

K,{t) « a^T, (7) 
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or a "continuous-deterministic" representation. 

The expressions in Eqs. (l5])-([7|) represent a theo- 
retical "bridge" connecting the discrete-stochastic rep- 
resentation of reaction dynamics and the more famil- 
iar continuous-deterministic description. With reactions 
classified at one of these levels of description, these for- 
mulae are used in r-leaping to determine the number of 
times each reaction fires within a given time interval. One 
is thus able to "leap" forward in the temporal evolution 
of a system multiple reaction firings at a time. 

Implementing these ideas algorithmically requires a 
practical method for determining the time interval over 
which the Poisson approximation ([5]) can be expected to 
hold, r-selection has been the subject of extensive re- 
searcbi^ii^'^ and has undergone numerous refinements 
recently. The basic idea is to impose a constraint on 
the magnitude of the change of an individual reaction 
propensity, 

|«M(^ + ^p^'="')-«MW|/e = e, (o<6«i), (8) 

where ^ is an appropriate scaling factor. In applying 
this constraint, one seeks to identify the time interval 
^Leap Q.^gj, propensity for reaction i?^ will 

remain essentially constant within a factor of e assuming 
that the propensities for all other reactions also remain 
essentially constant. One then, in essence, calculates a 
value of r^^'*? for each reaction in the system and sets r 
equal to the smallest of these. 

We will discuss r-selection further in Sec. IIII Bl For 
now, however, note that there is an interesting analogy 
between the procedure used in r-leaping and that in the 
FRM (and NRM by extension). In both cases, a time 
interval is calculated for a specific reaction channel with 
assumptions made regarding all other reaction channels. 
The time step is then set equal to the smallest of the set as 
it is the only one for which the assumptions actually hold. 
This suggests that the two methods might be seamlessly 
merged in some way, and forms the basis of our new 
simulation approach. 

With the time step r calculated, one then proceeds to 
determine the number of times each reaction fires in r 
using Eqs. ©-([T]). Strictly speaking, the r-leap method 
uses only Eq. ([5]). If the propensities of all reactions 
are such that {a^r} ^ 1, however, then Eq. ^ is em- 
ployed. In Ref. [l^ this is termed the Langevin method. 
Furthermore, if all {^/a^} ^ 1 then Eq. ([7]) is employed, 
which is equivalent to the explicit Euler method for solv- 
ing ordinary differential equationsJ^ Finally, a proviso is 
addedi^ whereby the SSA is used if r < 1/ao, since I/oq 
is the expected time to the next reaction firing in the 
systemji^ 

Additional modifications to r-leaping have been in- 
troduced recently. Primarily, strategies have been de- 
veloped to prevent the possible occurrence of negative 
species populations. This possibility arises from the fact 
that Poisson and normal random variables are positively 
unbounded and, while unlikely, could produce physically 
unrealizable numbers of reaction firings that result in the 



consumption of more reactant entities than are present 
in the system. Tian and Burrage^^. and Chatterjee et 
al^ avoid this problem by using binomial random num- 
bers rather than Poisson. Cao et aL^'~' identify "critical" 
reactions deemed in danger of exhausting their available 
reactant populations and treat them using a DM SSA 
approach. The interested reader is referred to Ref. [25 for 
further details regarding the current state of r-leaping 
methodologies. 



III. PARTITIONED LEAPING 
A. The Framework 

The primary change that we make to r-leaping involves 
classifying reactions individually once the time step r has 
been determined. The expressions ©-([7]) are derived for 
individual reactions and there is nothing precluding their 
use in this manner. Thus, sets of reactions can essen- 
tially be partitioned into "fast," "medium," and "slow" 
classifications based on the quantities {a^r}. We can 
apply Gillespie's proviso^ to each individual reaction as 
well, essentially classifying a reaction as "very slow" if 
r ^ A tentative next-reaction time for such a re- 

action can then be generated from Eqs. ([3]) or ([l]) and 
Rfj, deemed to fire if r^^ < r. 

This procedure amounts to a theoretically justifiable 
partitioning scheme in which reactions are classified into 
four different categories based on their propensity values, 
the calculated time step r, and the criteria identified by 
Gillespiei^i^ for transitioning between the descriptions 
©-([7]). The classifications are made as follows: 

• If a^r 1 ~^ Exact Stochastic (very slow) 

• If a^r > 1 but ^ 1 ^ Poisson (slow) 

• If a^r 3> 1 but ^/aj/f ^ 1 ^ Langevin (medium) 

• If y^Of^T ^ 1 ^ Deterministic (fast) 

These classifications constitute the basis of the parti- 
tioned leaping approach. At each simulation step, a time 
step r is calculated and each reaction classified in the 
manner outlined above. The numbers of reaction firings 
are then determined, based on these classifications, using 
Eqs. Q-Q and the system evolved accordingly. 

Various technical issues must be considered, however, 
in order to correctly implement this approach. The first 
involves the inclusion of the "Exact Stochastic" (ES) clas- 
sification and the random nature of r^^ . Specifically, if 
r^^ < r, and the clock is subsequently advanced by r, 
then the possibility of i?^ firing again within the interval 
(r — r^^) is precluded. To overcome this complication 
we employ an iterative procedure for determining r and 
classifying reactions. Once the reaction classifications are 
made r.^^ values are calculated for all ES reactions. Some 
of these may be smaller than r, and r is thus adjusted 



4 



to the smallest of these. Decreasing r will result in de- 
creased values of {a^r} and each reaction must thus be 
reclassified. Some reclassified reactions may become ES 
which previously were not and values must thus be 
calculated for all of these reactions. Again, these may be 
smaller than t and the procedure is thus repeated until 
no further adjustments are necessary.-4^ 

Situations may also arise in which all values of are 
larger than r. In this case, we leave r unchanged unless 
all reactions in the system are classified as ES. This is 
because when all reactions are ES it is always safe to al- 
low T to increase to the time at which the next reaction 
will fire in the system. When more coarsely classified 
reactions are present, however, this is not always so. In 
situations where the coarse reactions are independent of 
the ES reactions it is safe to increase r. Automatically 
and efficiently differentiating between this situation and 
that in which it is not acceptable to increase r is not triv- 
ial, however, especially when considering large, complex 
reaction networks. 

Another issue to consider concerns the proper use of 
Eq. (lU in our algorithm. The expression in (|4|) is a 
transformation formula in which a new value of is 
calculated using the new value of a^, the old value of 
ttfi, the old value of t^^, and the old time step. Under 
normal circumstances, the "old" values are those from 
the pre vious simulation step. As discussed in note 14 of 
Ref. however, this is not always the case. Specifically, 
when a reaction becomes inactive, = and r^^ = oo. 
Upon reactivation, these values clearly cannot be used in 
Eq. ([4]). Thus, the values of a^, rjf^ and r are used from 
the last simulation step at which was active. Values 
of {aKrl^^ — r')} are stored for all reactions that did 
not fire at the current step and used in Eq. (|4]) at the 
next step at which the corresponding reactions are ac- 
tive. Usually this is the subsequent step, but sometimes 
it is not. 

In our approach, we must account for the fact that 
reactions classified as ES can change their classification 
in the subsequent step and then return to an ES clas- 
sification at a later step. Although the reaction does 
not become inactive in this case, the same procedure of 
"carrying over" the values of a^, r^^ and r is used. We 
simply extend the procedure described above, therefore, 
and use the stored values of {a[,(T^^^ — t')} at the next 
step at which the corresponding reactions are active and 
are classified as ES. 

We also consider the problem of negative populations. 
We avoid this problem by simply tracking the state of 
the system and reversing the population updates if any 
populations are found to be negative after all reaction 
firings have been accounted for. The value of t is then 
reduced (which is always acceptable) and all reactions 
reclassified. We reduce r by 50%, but any amount should 
suffice. A reduction in r will result in smaller numbers of 
reaction firings and, hopefully, alleviation of all negative 
populations. If not, then the procedure is repeated until 
no further adjustments are necessary4^ 



This approach is equivalent to the simple "try again" 
procedure discussed and implemented in Ref. [13 as a sec- 
ond layer of protection against the occurrence of nega- 
tive populations. This is used in conjunction with the 
primary strategy of identifying critical reactions. Note, 
however, that the critical reaction strategy essentially 
amounts to a partitioning of reactions into ES and "Pois- 
son" classifications with the intent of avoiding negative 
populations by maintaining a fine-level description of re- 
actions with small reactant populations. Our method 
already accomplishes this via inclusion of the ES classifi- 
cation and thus avoids the need to introduce additional 
calculations or parameters (see Sec. |V] for a further dis- 
cussion). 

Finally, in determining the set of reaction firings, use 
of Eqs. (O and (O for "Langevin" and "Deterministic" 
reactions, respectively, will result in values that are real 
numbers rather than integers. Since it is difficult to de- 
termine at what point a continuous population descrip- 
tion is acceptable in lieu of an integer description, we 
choose to round these values before updating the species 
populations. In Ref. [l^ it was argued that use of Eq. Q 
as opposed to (O is an improvement computationally 
since generation of normal random numbers is faster than 
Poisson random numbers. Some of this improvement 
is clearly negated, therefore, by including a subsequent 
rounding operation, although we have yet to quantita- 
tively investigate the extent of this effect. While the 
same argument holds for "Deterministic" reactions, the 
elimination of the random number generation operation 
should more than compensate for the added rounding 
procedure. 

With these issues accounted for, the partitioned leap- 
ing algorithm (PLA) is presented as follows: 

1. Initialize (species populations, rate constants, de- 
fine ~ 1, 1, e, etc.). 

2. Determine the initial value of r (see Sec. IIII Bt . 

3. Classify all reactions (not already classified as ES) 
using the criteria presented above. 

4. For all (newly classified) ES reactions, calculate 
tentative next-reaction times, r^^, using Eqs. ([3]) 
and Q. 

5a. If Min{Tj^^} ^ r and all reactions are ES, set r = 
Min{T^S}. 

5b. If Min{T^S} < t, set r = Min{r^^} and return to 
step 3. 

6. Determine the set of reaction firings {k^ir)} using 
the appropriate formulas^'^ and update the species 
populations. 

7. If any Xi{t+T) < 0, reverse all population updates, 
set T — t/2 and return to step 3. If not, advance 
the clock by r and return to step 2 if stopping cri- 
terion not met. 
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It is important to recognize the minimal user interven- 
tion required to implement this approach. Once the re- 
action network is defined and the associated rate param- 
eters set, one need only define three model- independent 
parameters quantifying the concepts ~ 1 (for ES classifi- 
cation) , ^ 1 (for coarse classifications) and ^ 1 (defining 
e). The algorithm then automatically and dynamically 
partitions the reactions into various subsets, correctly ac- 
counting for stochastic noise and "leaping" over unimpor- 
tant reaction events. 



B. Determining the Initial Time Step 

In practice, there are two alternate approaches for car- 
rying out Step 2 of the PLA. The first is a reaction-based 
approach in which values of r^^^^^ are calculated for each 
reaction in the system using the constraint expression 
in dHl) directly,i^ii2i2^ r is then set equal to Min{T^'='^P}. 
Older versions did this with ^ = gq in (Or^iii but the 
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current approach uses ^ 
is given by^ 



22 



Leap 



Min ■ 



In this approach, r^f'^P 



(9) 



Max{ea^,/3^} 



M 



M 



N 



Here, represents the minimum possible change in 
the propensity a^. This quantity is used to overcome 
problems associated with small reactant populations. In 
Ref. m, = Cfj,, the stochastic rate constant for i?^. For 
first-order reactions this definition is exact. For higher- 
order reactions, however, this is a lower bound: If the 
propensity changes at all it will do so by an amount 
> Thus, this approach is simple but may also be 

restrictive. We propose an alternative: An approximate 
but less restrictive value of is the smallest non-zero 
value of {da^/ dXj} , where j indexes all reactant species 
involved in R^. If all values of {da^/ dXj} are zero, how- 
ever, then 13^ — c^. 

The second approach for determining r constrains the 
relative changes in the species populations in such a way 
that ([5]) is satisfied for all reactions. r is then set equal 
to Min{Tf'=''P}, where2^ 



Leap 



Min 



Max{eX,/g„ 1} 



(10) 



mi{t) = y^z^ia^(t), 

v=l 
M 



The parameter gi depends on the highest-order reaction 
that species Si is involved in and can be determined by 
simple inspection during initialization;^^ The expressions 
in require fewer computational operations than those 
in ([9]) and each t]''^^^ calculation should thus be signifi- 
cantly faster than each t^'^^p. 

In Ref. 1221 expressions for gi are presented for all re- 
action types up to third order. For reactions involv- 
ing more than one entity of a certain species, such as 
2Si — > products, gi is a function of Xi. These expressions 
have clear upper and lower bounds, however. Thus, in 
order to simplify the calculations we use the more restric- 
tive, but computationally simpler, upper bounds. We 
determine gi as follows: 



If Si is a reactant in 



then gi ~ 



35,; 



25, 
5, 



25, 



Si + Sj + Sk 
25, 
Si -\- Sj 



products, 
products, 

■ products, 

■ products, 

■ products, 
products, 
products. 



11/2, else 
9/2, else 

3, else 



zero reactions. 



Note that the last item will result in T, 
will never be Min{Tf°^P}. 



2, 
1, 


Leap 



else 
else 



oo, which 



IV. EXAMPLES 

With the presentation of our simulation approach now 
complete, we will demonstrate in this section the util- 
ity of the method, in terms of efficiency and accuracy, 
via two illustrative examples. We will begin by consid- 
ering a simple model of clustering that illustrates the 
algorithm's ability to treat systems in which species pop- 
ulations vary over many orders of magnitude. A biologi- 
cally inspired model system will then be considered that 
illustrates how the stochastic process of gene expression 
can be accurately and efficiently simulated in conjunc- 
tion with reactions involving large reactant populations 
(e.g., metabolic processes). 



A. Simple clustering 

Clustering processes are inherently multiscale since 
large numbers of small clusters generally coexist within 
a system with small numbers of large clusters. As such. 
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clustering provides an ideal way to demonstrate the abil- 
ity of the PLA to treat systems in which species popula- 
tions vary over many orders of magnitude. 

We have considered a simple clustering model com- 
prised of the following nine reactions: 



(a) 



Ri 
R2 
R3 

Rq 



2Si 
S1 + S2 

Si + Sg 



S2 
S3 

Si 



(11) 



s 
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For the sake of simplicity we have neglected dissociation 
reactions and assume that monomers (Si) are the only 
mobile species in the system. Furthermore, in order to 
confine the multiscale effects to variations in the species 
populations alone, we have chosen deterministic rate con- 
stants such that their stochastic counterparts are equiva- 
lent for all reactions. For i?i we choose 3.0 x 10^ s~^, 
and for all other reactions 6.0 x 10^ M'^ s^^. We set 
the initial monomer concentration to 1.66 x 10^^ M and 
consider various system volumes f2 ranging from 10^^'"' 
to 1G~^ L. This corresponds to initial monomer popula- 
tions ^i(O) = 10^ to 10^ and stochastic rate constants 
{c^} = 10"^ to 10~^ s^-^. All simulations were run until 
consumption of all monomers was complete. 

In Fig. [Tl we compare average numbers of simulation 
steps and CPU times required for PLA and SSA simula- 
tions of at all system sizes considered. In general, 
the CPU times follow the same trends as the simulation 
steps and the reaction-based calculations (PLA-RB) re- 
quire more steps, and more CPU time, than the species- 
based (PLA-SB). We also see that for the smallest sys- 
tem size the SSA and PLA give identical results, meaning 
that all reactions were classified as ES at all steps of the 
PLA simulations. As the system size increases, however, 
increased amounts of leaping are observed. The effect 
is modest for system sizes of lO"^"* and 10"^'^ L, but in- 
creases dramatically beyond that. Moreover, the number 
of steps actually decreases for PLA simulations of sys- 
tems larger than ~ 10""'^^ L. The effects of leaping thus 
overtake the system size effects for these large systems. 

In Fig. [21 we show the classifications achieved for each 
reaction in (jlip at each simulation step of a single PLA- 
RB-3% simulation at a system size of 10~^ L. These 
results show leaping in action and illustrate the inher- 
ent multiscale nature of the reaction network. Reac- 
tions involving the smallest clusters (i.e., R1-R4) ex- 
perience extensive amounts of leaping for much of the 
simulation, with classifications varying rapidly between 
ES, "Poisson," "Langevin" and, at times, "Determinis- 
tic." Conversely, reactions involving larger clusters expe- 
rience much less leaping. Reactions Rj-Rg, for exam- 
ple, experience only small amounts of leaping up into the 
"Langevin" regime. 

As for accuracy, smoothed frequency histograms (see 
Appendix) were generated from 10 000 simulation runs 
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FIG. 1: Average numbers of steps (a) and CPU times (b) 
required for 10 000 PLA and SSA simulation runs of the sim- 
ple clustering model (|lip [PLA-RB-1% means reaction-based 
r-selection with e = 0.01, etc.]. Reaction classifications were 
made in the PLA runs using « 1 = 3 and 3> 1 = 100. All 
simulations were performed on a 1.80 GHz Athlon processor. 



of the SSA and the PLA for various cluster sizes (the 
populations of which vary by as many as four orders of 
magnitude) at various system sizes. Modified versions of 
the histogram distance D and the self distance i^gg'^, in- 
troduced by Cao and Petzold^ were then used to make 
quantitative comparisons. In all cases, the calculated 
histogram distances were smaller than the SSA self dis- 
tances, indicating excellent accuracy (see the Appendix 
for further details). 



B. Stochastic gene expression 

The origins and consequences of stochasticity in bi- 
ological systems has been a subject of intense inter- 
est recently jii^i^ii^i^iii^ In cellular systems, the primary 
source of "intrinsic" stochastic noise is gene expression, 
where the small numbers of regulatory molecules in- 
volved in the process result in proteins being produced 
in "bursts" rather than continuously.^'-^- Other cellular 
processes, such as metabolism, often involve large num- 
bers of molecules and it has been shown that stochas- 
tic fluctuations in gene expression can quantitatively af- 
fect these dynamics. A fully stochastic treatment of 
biological systems involving both gene expression and 
metabolism is infeasible,"*^ however, and thus provides 
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FIG. 2: Classifications vs. simulation step for each reaction of 
the simple clustering model (|lip at = 10~® L [i.e., Xi(0) = 
10®]. Classifications are: (1) Exact Stochastic, (2) Poisson, 
(3) Langevin, (4) Deterministic. Results are for a single PLA- 
RB-3% simulation using 1 = 3 and 3> 1 = 100. 
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motivation for developing multiscale simulation meth- 
ods capable of treating systems involving both large- and 
small- number dynamics. 

We have applied the PLA to a simple biologically in- 
spired model system that involves both gene expression 
and protein-protein interactions. The network that we 
have considered is as follows: 



Ri 
R2 
R3 

i?4 
i?5 



G 

G* 

P + Q 

{P--Q} 
R 



G* 

G + nP 

{P--Q} 

R + Q 



(12) 



The first two reactions constitute the gene expression 
part of the network, where the single gene G sponta- 
neously converts into an active conformation G* that 
then produces proteins P in bursts of n. The third and 
fourth reactions constitute the protein-protein enzymatic 
part of the network where P interacts with Q to form an 
enzyme-substrate complex {P:Q} that then produces R 
and regenerates Q. The final reaction models the degra- 
dation of R. 

Rate constants for the five reactions in (fT2)) were set 
equal to 750 s-\ 750 s-\ 6.0x10^ M'^ s"i, 100 s'^, and 
50 s^^, respectively. The initial enzyme concentration 
[Q]o was set equal to 1.66x10"'^ M, n to 0.2Xq{0),'^^ and 
investigations were carried out for various system sizes 
ranging from 10~^^ to 10"'' L. This corresponds to initial 
enzyme populations Xq{Q) ranging from 10^ to 10^°. In 
all cases, the system began with a single entity of G and 
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FIG. 3: Results for typical PLA-RB-3% simulations (-1 = 3, 
^ 1 = 100) of the simple gene expression model (|12p : (a) A 
typical time course at Q, = 10^^^ L, (b) value of the time 
step at each simulation step for the same run as in (a), (c) 
classifications vs. simulation step for reaction R3 at various 
system sizes. 



null populations of G* , {P : Q} and R. All simulations 
were run until t = 1 s. 

In Fig. [21 we show typical results for PLA-RB-3% sim- 
ulations of . The time course plot in Fig. ^a.) , gener- 
ated for a system size of 10~^^ L, illustrates the stochastic 
nature of the gene expression dynamics, apparent in the 
noisy time evolution of the gene product P. Conversely, 
the large-number dynamics result in much smoother tra- 
jectories for Q, {P : Q} and R. The time step plot in 
Fig. [S{b), taken from the same simulation as in[3Ka), il- 
lustrates the algorithm's ability to dynamically adjust as 
the system evolves. The first 3000 simulation steps cor- 
respond to '--^ 0.01 s of simulated time, while a time of 
0.1 s is reached around step 6400. The entire simulation 
takes 9192 steps to complete. Thus, simulating the first 
1% of the evolved time required ^33% of all simulation 
steps and the first 10% ~70% of all steps. 

In Fig. [3Kc) , we see how the classifications coarsen with 
increasing system size. For il — lO^^"^ L the classifications 
for i?3 never reach beyond "Poisson." For U — IQ^^^ L the 
classifications coarsen approximately midway through 
the simulation, and for — 10"'' L "Deterministic" sta- 
tus is achieved quickly and maintained almost exclusively 
throughout. Similar results are obtained for reactions R4 
and i?5 (data not shown) while i?i and R2 are classified 
as ES at all steps of the simulation. 

In Fig. [H we compare the average numbers of Simula- 
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tion steps and the CPU times required for PLA and SSA 
simulations of (HH). Again we see that the CPU times 
generally follow the same trends as the simulation steps 
but, contrary to what is observed in Sec. IIV Al the SB 
calculations require more steps, and longer CPU times, 
than the RB. The efficiencies of the two r-selection pro- 
cedures thus appear to be system-dependent. This is an 
interesting result and an area of possible future investi- 
gation. 

We also see in Fig. |4] that the general trend for the 
simulation steps is an initial increase with system size, 
followed by a drop, another increase and then another 
drop with an eventual leveling off (the only departure 
from this is for the PLA-RB-3% where the number of 
steps initially decreases with system size). The initial 
behavior is the same as that seen for the simple clus- 
tering model (see Fig. [T]) and can easily be explained in 
terms of a competition between system size and leaping 
effects. The second increase is unique to this example, 
however, and arises from an increasing fraction of simu- 
lations requiring large numbers of steps. At 10"^^° L, for 
example, 95% of all PLA-RB-1% simulations require be- 
tween 38 078 and 437061 steps to complete. At lO'^^ L, 
the range is 38 081 and 142 687. The stochastic nature of 
the gene expression dynamics thus results in a wide vari- 
ety of possible time-evolution trajectories for systems of 
size 10^^^ to 10^^ L, some of which require many more 
simulation steps to complete than others. The ability of 
the PLA to handle systems in this "medium" size range 
is a particular strength of the method. 

Finally, in terms of accuracy, we generated smoothed 
frequency histograms for each species in p2p at vari- 
ous system sizes and again observed excellent agreement 
between the PLA and SSA. In all cases, the calculated 
histogram distances were smaller than the SSA self dis- 
tances. 



V. DISCUSSION AND CONCLUSIONS 

In this article, we have presented a new multiscale sim- 
ulation approach, termed the "partitioned leaping algo- 
rithm," for modeling stochasticity in chemical reaction 
networks. Clearly, this method is closely related to r- 
Ieapingii^iiii2£i2^ The key difference is that reactions are 
classified individually in the PLA, which leads to vari- 
ous advantages over r-leaping: The PLA overcomes the 
problem of negative populations more simply, the SSA is 
incorporated into the algorithmic framework in a more 
seamless and natural way and reactions are treated at 
four levels of description rather than two. 

The negative populations problem is avoided in the 
PLA primarily through the structure of the algorithm. 
This can be seen most easily by considering the RB r- 
selection procedure ([9]), though the arguments hold for 
the SB approach as well. If a reaction has a small reac- 
tant population, such that the reaction would be deemed 
"critical" in r-leaping, then its t^^'^p value will be 




.|Q-15 .|Q-13 

System Size (L) 



FIG. 4: Average numbers of steps (a) and CPU times (b) 
required for 10 000 PLA and SSA simulation runs of the sim- 
ple gene expression model (|12p . Reaction classifications were 
made in the PLA runs using « 1 = 3 and S> 1 = 100. All 
simulations were performed on a 1.80 GHz Athlon processor. 



< 1/a^, the expected time of its next firing. This is be- 
cause, for very small reactant populations, a single firing 
will change by a fractional amount > e, which will be 
automatically recognized in the r-selection process. As a 
result, the reaction will be classified as ES (since t is al- 
ways <Tj^°'^^) and thus prevented from firing more than 
once, assuring that its reactant populations do not fall 
below zero. Since this outcome arises as a natural con- 
sequence of the design of the PLA, there is no need for 
additional calculations or parameters, such as those asso- 
ciated with the critical reaction search in T-leaping] ^°'^^ 
Another difference between the methods is that in r- 
leaping the r-selection formulas © and pU]) are applied 
only to non-critical reactions In the PLA, these for- 
mulas are applied to all reactions. This is merely a su- 
perficial difference, however. Although critical reactions 
are treated separately from non-critical reactions in r- 
leaping, they still contribute to r-selection, just in a dif- 
ferent way. In r-leaping, "candidate" time steps are cal- 
culated for the critical and non-critical reaction subsets 
and r set equal to the smaller of the two.^'^'^^ The criti- 
cal reaction time step is calculated using Eq. ^ with ao 
replaced by ag, the sum of the critical reaction propen- 
sities. Obviously, this will give a time interval on the 
order of 1/Max{aj^}. In the PLA, these same reactions 
will be treated using Eqs. (O or (fTU)) . As just discussed. 
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in the PLA-RB scenario the result will be t^^'^p values 
on the order of 1/a^. t is set equal to Mm{T^^'^P}, so the 
overall result is essentially the same: the "critical" reac- 
tions contribute to the r-selection process a time interval 
on the order of 1/Max{a^}. The randomness associated 
with these reactions is then added in in Step 4 of the 
PLA using the methods of the NRM^^ [Eqs. and (g])], 
which is the obvious choice given the structure of the 
algorithm. 

The ability of the PLA to treat reactions at the 
continuous-stochastic and deterministic scales, rather 
than just the discrete-stochastic, is an important at- 
tribute of the method as well; the PLA could be seen 
as unifying, into one overarching approach, three differ- 
ent types of simulation methods: exact-stochastiC ) r- 
leapingfi^i2^ and methodology coupling, or hybrid, tech- 
niques Hybrid methods operate by parti- 
tioning reactions into "fast" and "slow" subsets and us- 
ing, e.g., deterministic reaction rate equations or stochas- 
tic differential equations, to describe the fast reactions 
and the SSA (or modified versions thereof) for the slow. 
The primary shortcoming of these approaches is the 
lack of a sound theoretical basis for the partitioning 
scheme, which calls into question the extent of their 
utility. The partitioning procedure described in Sec. lIII Al 
overcomes this problem; partitioning is accomplished via 
Gillespie's rigorously derived criteria for transitioning be- 
tween the descriptions ©-([T]).^^''^^ Of course, as cur- 
rently formulated, "Langevin" and "Deterministic" re- 
actions are treated in a manner equivalent to the explicit 
Euler method for solving stochastic and ordinary differ- 
ential equations, respectively, which may not be satisfac- 
tory in all cases. 

Recognizing this, Petzold and co-workers have devel- 
oped various "implicit" r-leaping method a^^'^^i^^ that 
take into account the values of the propensities at both 
the beginning and end of the time step r. By doing so, 
these methods are able to take larger time steps than ex- 
plicit methods^^'^^ that only consider the propensity at 
the beginning of the step (the algorithm presented here is 
explicit). This comes at a cost, however: Implicit meth- 
ods dampen fluctuations in the species populations and, 
hence, underestimate the amplitude of the noise F^SiiSii^ 
Nevertheless, the ability of these methods to maintain 
stability in situations where explicit methods fail has 
been demonstrated. Incorporating these ideas into the 
PLA is an interesting possibility, and an area of possible 
future investigation. 

From a practical point of view, it is unclear to what 
extent the efficiency of the PLA is actually improved by 
including the "Langevin" and "Deterministic" classifi- 
cations. As mentioned above, the reason for including 
these is to presumably improve computational efficiency 
via faster generation of normal random numbersi^ or by 
eliminating random number generation altogether. Their 
inclusion adds complexity to the method, however. To 
what extent this attenuates the gains in efhciency re- 
mains to be seen, and is something we plan on investi- 



gating further in the future. Note, however, that a use- 
ful feature of the PLA is its ability, via manipulation 
of the classification parameters, to force a simulation at 
a particular level of description (e.g., if ^ !,« 1 = oo, 
then all reactions will be classified as ES at all steps of a 
simulation - See Sec. IIII Ap . With the coarse classifica- 
tions included, one could use this feature (with rounding 
turned off) to perform, e.g., deterministic simulations of 
a system for direct comparisons to results from the PLA. 

Finally, we must also acknowledge the primary short- 
coming of the PLA, namely, in handling systems with 
widely disparate rate constants. Consider a system that 
contains a single entity, such as a gene, experiencing rapid 
ON/OFF or binding/unbinding behavior. Because there 
exists only one entity these reactions will be flagged as 
ES, and because of the large rate constants their t^^^^p 
values (in the PLA-RB) will be small. The result will be 
small time steps and a "bogging down" of the PLA. Note 
that this is a problem in r-leaping as well. 

The only way to overcome this problem is to "reduce" 
the model so that the effects of the fast reactions are ac- 
counted for in some approximate way. This is a com- 
mon approach; examples include quasi-steady-state or 
Michaelis-Menten reductions. Recently, more advanced 
model reduction schemes have been proposed that can 
be applied to generalized reaction networks^ii^^i^i^i and, 
in some cases, dynamically] '^^ Since these methods are 
specifically designed to overcome problems associated 
with widely disparate rate constants they could form an 
effective complement to leaping techniques that tackle 
the problem of widely disparate species populations. Fu- 
ture techniques combining automatic and dynamic model 
reduction with partitioned leaping could open the door 
to computational investigations far beyond the reach of 
current approaches. 
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APPENDIX: HISTOGRAM SMOOTHING, 
HISTOGRAM DISTANCE AND SELF DISTANCE 

For a set of N data points {xi, X2, ■ ■ ■ , xn}, the total 
number falling within a discrete interval [x,x + A) can 

be formally written as J2iLi Ix^^ ^i^i ~ x')dx', where 
6{xi — x') is the Dirac delta function and the integral 
equals unity if Xi lies within [a;, a; -I- A) and zero otherwise. 
A "histogram density" can then be obtained by dividing 



10 



J2 

SI 

o 



n nnfi 




• PLA-RB-5%: D=0.380 
■ PLA-RB-37o: D=0.162 

• PLA-RB-1%: D=0.028 

• SSA: E[D^^"]=0.038 


0.004 




















0.002 






n 






















■MM 





13,400 



13,800 

X2(t=10) 



14,200 



FIG. 5: Smoothed population histograms for species 52 at 
t — 10 obtained from 10 000 simulation runs of the "decaying- 
dimerizing" reaction set^^ using the SSA and the PLA-RB 
with various values of e. Reaction classifications were made 
in the PLA runs using « 1 = 3 and ^ 1 = 100. All histograms 
were generated using Eq. (|A.3|I with a = 15. 



this quantity by iVA and taking the hmit as A ^ 0, 

M^)=Hm— 5{x,-x')dx'. (A.l) 



The "hat" in h(x) signifies that this quantity is a statis- 
tical estimator of the true histogram density 



h(x) = Ihn h{x). 

N^oo 



(A.2) 



A "smoothed" histogram can then be obtained by ap- 
proximating the delta function in (jA.ip by a finite width 

Gaussian Kexp ^ ~^^2(t'^ ^ ) ' where k is a normalization 
constant and is the (user-defined) variance. Substitut- 
ing into (jA.ip . noting that to first order j^^^ e^^^ du — 
Ae^^ , and normalizing, gives 



h{x) ^ 



1 



2TTaN 



N 

^exp 



-{Xi 



2(72 



(A.3) 



All smoothed histograms generated in this work were 
obtained using this expression. Examples are shown in 
Fig. [5l obtained from 10 000 PLA and SSA simulations 
of the "decaying-dimerizing" reaction set using the same 
rate parameters and initial conditions as in Ref. T?. 

In order to quantitatively compare results obtained via 
diiferent simulation methods (i.e., PLA, SSA, etc.) we 
use the "histogram distance" discussed by Cao and Pet- 
zold.^'* The quantity 6hx is defined as hi{x) — h2{x) and 
the histogram distance D is then 



D \ 



\5hx\dx. 



(A.4) 



The factor of 1/2 assures that D lies within [0,1), with 
constituting a perfect fit and 1 a complete mismatch. 

It is important to note that D is defined in (|A.4|) in 
terms of the true histogram densities hi{x) and h2{x). 
In practice, we only have their estimators and can thus 



only calculate an estimated value of D. As a result, a cer- 
tain amount of statistical uncertainty is associated with 
the comparison of histograms. In order to quantify this 
uncertainty Cao and Petzold^"' introduce the "self dis- 
tance," which essentially measures the distance 
between the estimator h{x) and the true density h{x). 
Expressions for the upper bounds on the mean and vari- 
ance of D^'^^^ are presented in Ref. in terms of the 
number of bin intervals K . However, since we are using 
Eq. (|A.3|) rather than a counting procedure to generate 
h{x) we must derive alternate expressions. 
We begin by defining 



^sclf 



IJjShf'ldx, 
h{x) — h{x). 



(A.5) 



Following Cao and Pctzold,^'* we then note that the num- 
ber of points falling within the interval [x, x -f A) is a bi- 
nomial random variable B{px, N), where is the success 
probability. h{x) can then be written as 

h{x) = ^m^B{p^,N)/NA = B{dp^, N)/Ndx. (A.6) 

Since the mean E[B{px, N)] — Np^ and the variance 
YaT[B{px, N)] = NpxQx [q-x = 1 - Px), the mean and 
variance of h{x) are dpx/dx and dpxdqx/Ndx^ , respec- 
tively. With dpx = h{x)dx and dqx = 1 — h{x)dx, the 
mean and variance of (5/i|°'^ are then 

nShf'] = 0, (A.7) 
variShf^] = [h{x)dx][l - h{x)dx]/Ndx^ h{x)/Ndx, 

where the last line utilizes the histogram density estima- 
tor h(x). 

For large N, (5/i|°'^ can be approximated as a normal 
random variable with mean zero and variance h{x)/Ndx. 
This means that 



5h 



self 



(A., 



ih{x)/Ndx 
is approximately standard normal and 

h{x)/Ndx 



(A.9) 



is approximately chi distributed with one degree of free- 
dom. Since the mean and variance of a chi random vari- 
able with one degree of freedom are y/2/Tr and (tt — 2) /tt, 
respectively, this gives 



n\shf'\] 

var[|<5/i-»|] 




(A.IO) 
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Finally, using the identities^i 

E[JJ{x)dx] = / E[/(x)]dx, 

J X 

Y^v[jJ{x)dx\ < (^j^ VVar[/(x)]dx) , 

we get 



In practice, we calculate E[I?^^f^], the self distance for 
a reference histogram generally obtained using the SSA. 
This value then tells us that any histogram with a dis- 
tance D < E[D^°f] cannot be statistically differentiated 
from the reference histogram. In Fig. [S] we see that only 
the PLA-RB-1% histogram achieves this level of accu- 
racy. The expression for var[Z?''°'^] is included here for 
completeness. 
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